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Abstract 


A  general  framework  for  analyzing  estimates  in  nonlinear  time 
series  models  is  developed.  Ergodic  strictly  stationary  series  are 
treated.  General  conditions  for  strong  consistency  and  asymptotic 
normality  are  derived  both  for  conditional  least  squares  and  maximum 
likelihood  type  estimates.  Examples  are  taken  from  exponential 
autoregressive,  random  coefficient  autoregressive  and  bilinear  time 
series  models.  Some  nonstationary  models  and  examples  are  treated 
in  a  sequel  to  this  paper. 
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Introduction 
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Recently  there  has  been  a  growing  interest  in  nonlinear 
time  series  models.  Some  representative  references  are  Andel  (1976) 
and  Nicholls  and  Quinn  (1982)  on  random  coefficient  autoregressive 
models.  Granger  and  Andersen  (1978)  and  Subba  Rao  (1981)  on  bilinear 
models,  Haggan  and  Ozaki  (1981)  on  exponential  autoregressive  models, 
Tong  and  Lim  (1980)  on  threshold  autoregressive  models,  Harrison  and 
Stevens  (1976),  Ledolter  (1981)  on  dynamic  state  space  models  and 
Priestley  (1980)  on  general  state  dependent  models.  A  review  has 
been  given  in  Tjdstheim  (1984a) . 

To  be  able  to  use  nonlinear  time  series  models  in  practice  one 
must  be  able  to  fit  the  models  to  data  and  estimate  the  parameters. 
Computational  procedures  for  determining  parameters  for  various  model 
classes  are  outlined  in  the  above  references.  Often  these  are  based 
on  a  minimization  of  a  least  squares  or  a  maximum  likelihood  type 
criterion.  However,  very  little  is  known  about  the  theoretical 
properties  of  these  procedures  and  the  resulting  estimates.  An  excep¬ 
tion  is  the  class  of  random  coefficient  autoregressive  processes  for 
which  a  fairly  extensive  theory  of  estimation  exists  (Nicholls  and 
Quinn  1982) .  See  also  the  special  models  treated  by  Robinson  (1977) 
and  Aase  (1983).  Sometimes  properties  like  consistency  and  asymptotic 
normality  appear  to  be  taken  for  granted  also  for  other  model  classes, 
but  some  of  the  simulations  performed  indicate  that  there  are  reasons 
for  being  cautious. 

In  this  paper  we  will  try  to  develop  a  more  systematic  approach 
and  discuss  a  general  framework  for  nonlinear  time  series  estimation. 
This  enables  us  to  survey  known  results  with  new  proofs  as  well  as  to 
obtain  a  number  of  new  results.  The  approach  is  based  on  Taylor 
expansion  of  a  general  penalty  function  which  is  subsequently 
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specialized  to  a  conditional  least  squares  and  a  maximum  likelihood  type 
criterion.  Klimko  and  Nelson  (1978)  have  previously  considered  such 
Taylor  expansions  in  the  conditional  least  squares  case  in  a  general 
(non-time  series)  context. 

Our  approach  yields  the  estimation  results  of  Ouinn  and  Nicholls 
(1982)  as  special  cases,  and,  in  fact,  we  are  able  to  weaken  their 
conditions  in  the  maximum  likelihood  case.  The  results  derived  are  also 
applicable  to  other  classes  of  nonlinear  time  series.  Although  the 
conditions  for  consistency  and  asymptotic  normality  are  not  always 
easy  to  verify,  they  seem  to  give  a  good  indication  of  the  specific 
problems  that  arise  for  each  class  of  series.  They  also  suggest  that 
for  some  models  quite  strong  assumptions  could  be  needed,  and  thus  that 
there  are  situations  where  taking  consistency  and  asymptotic  normality 
for  granted  may  lead  astray. 

We  have  found  it  convenient  to  subdivide  our  results  into  two 
papers.  Tn  the  present  paper  we  study  strictly  stationary  ergodic 
series.  This  allows  us  to  use  the  ergodic  theorem  and  the  central  limit 
theorem  for  ergodic  strictly  stationary  martingale  differences 
(Billingsley  1961).  The  assumption  of  strict  stationarity  may  appear 
overly  restrictive  from  a  practical  point  of  view,  and  in  some  cases 
it  certainly  is.  However,  it  should  be  realized  that  a  strictly 
stationary  nonlinear  model  is  capable  of  producing  realizations  with 
a  distinctive  nonstationary  outlook  (cf.  e.g.  Nicholls  and  Quinn  1982, 
Sec.  1  and  Tj^stheim  1984a,  Sec.  5.1). 

An  outline  of  the  paper  is  as  follows:  In  Section  2  we  present 
some  results  on  consistency  and  asymptotic  normality  using  a  general 
penalty  function.  In  Sections  3  and  5  we  specialize  to  conditional 
least  squares  and  to  a  maximum  likelihood  type  penalty  function. 
Applications  of  our  results  to  a  wide  range  of  examples  of  nonlinear 
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tine  series  are  given  in  Sections  4  and  6. 

In  the  sequel  to  this  paper  (Tjdstheim  1984b)  we  look  at  some 
nonstationary  models  again  basing  our  results  on  a  Taylor  expansion 
of  a  general  penalty  function.  A  number  of  additional  examples  are 
given  in  that  paper. 

2.  Two  results  on  consistency  and  asymptotic  normality. 

The  two  results  to  be  stated  in  this  section  will  be  formulated 
without  requiring  stationarity,  since  versions  of  them  will  be  used  also  in 
Tjdstheim  (1984b). 

Let  {Xt,t£l}  be  a  discrete  time  stochastic  process  taking  values 

in  Rd  and  defined  on  a  probability  space  (ft,F,P).  The  index  set  I 

is  either  the  set  Z  of  all  integers  or  the  set  N  of  all  positive 

integers.  We  assume  that  observations  (X, ,...,X  )  are  available.  We 

1  n 

will  treat  the  asymptotic  theory  of  two  types  of  estimates,  namely 
conditional  least  squares  and  maximum  likelihood  type  estimates.  Both 
of  these  are  obtained  by  minimizing  a  penalty  function,  and  since,  in  our 
setting,  the  theory  is  quite  similar  for  the  two,  we  will  formulate 
our  results  in  terms  of  a  general  real -valued  penalty  function  =  0^(8) 

=  (^(Xj, . . . ,Xn;B)  depending  on  the  observations  and  on  a  parameter 
vector  8. 

T 

The  parameter  vector  6  =  ]  will  be  assumed  to  be 

lying  in  some  open  set  B  of  Euclidean  r-space.  Its  true  value  will  be 
denoted  by  8°.  We  will  assume  that  the  penalty  function  is  almost 
surely  twice  continuously  differentiable  in  a  neighborhood  S  of  8*\  We 
will  denote  by  | • |  the  Euclidean  norm,  so  that  | 8 |  =  (80).  For  6  >  0, 
we  define  =  (8:  |8-8^|<  5}.  We  will  use  a.s.  as  an  abbreviation  for 
almost  surely,  although,  when  no  misunderstanding  can  arise,  it  will  be 
omitted  in  identities  involving  conditional  expectations. 


Theorems  2.1  and  2.2  are  proved  using  the  standard  technique  of 
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Taylor  expansion  around  8°  (cf.  Klimko  and  Nelson  1978  and  Hall  and 
Heyde  1980,  Ch.  6).  Let  N^cS.  Moreover,  let  3^/38  be  the  column 
vector  defined  by  3Qn/38^,  i=l,...,r,  and  likewise  let  3^/38^  be 


the  r*r  matrix  defined  by  3  Qn/38i38;.,  i,j=l,...,r.  Then 

2 

0  0  t  0  0  t  3  <1  0 

qjb)  =  o^e )  +  (8-8  r  — ^8 )  +  %(8-b  y  — -j(8-8 ) 


(2.1) 


+  j5(8-8°)T{^-~  (8*)  -  (8°) | (8-8° 

i  aR*  »r^  ; 


88 


) 


38 


is  valid  for  1 8-8  |<<5.  Here  8=8  (X^,  —  ,Xn;8)  is  an  intermediate 

0 

point  between  8  and  8- 

Theorem  2.1;  Assume  that  {X^}  and  are  such  that  as  n  +  « 


A1 :  n 


-A  „0 


38. 


a.s. 

(8”)  -  0, 


i=l, . . . ,r 


2  0  2 

A2:  The  symmetric  matrix  3  Qn(8  )/38  is  non-negative 
definite  and 


■  •  ■  «  n  a  sl  •  s  • 

llm  X  .  ce°)  >  0 

n-*»  ram  v 


0  2O2 

where  Xn.  (8  )  is  the  smallest  eigenvalue  of  3“Q  (8  )/38  . 


min 


A3:  lim  sup  (n<5) 

n-x»  6+0 


-1 


TOF7(b  >  '  3H.SB.Ce  ) 

13  13 


a.s. 

<  00 


for  i, j=l , . . . ,r. 

Then  there  exists  a  sequence  of  estimators  8  =({L.,  ...,6  )T  such 

n  '  iu.  nr 

A  a  s  0 

that  8n  *  '  8  as  n  +  «,  and  such  that  for  e  >  0,  there  is  an  event 
in  fQ,F,P)  with  P(E)  >  1-e  and  an  nn  such  that  on  E  and  for  n  >  nQ,  3(Qfi(|i 


0 

/v 

i*l,...,r,  and  Qn  attains  a  relative  minimum  at  6n- 


n 


Proof:  The  proof  is  as  in  Klimko  and  Nelson  (1978),  but  it  will  be 


in)/38.=0, 


outlined  to  demonstrate  explicitly  that  the  argument  does  not  depend 
on  the  special  conditional  least  squares  function  used  there. 


Taking  into  account  A1-A3  we  can  use  Egorov's  theorem.  Thus  for 
a  given  e  >  0  we  can  find  an  EeF  with  P(E)  >  1-e,  a  positive  6*  <  6, 
an  M  >  0,  a  A  >  0,  and  an  n^  such  that  on  E  and  for  n  >  we  have 

for  3  e  N  * 

6 


0  T 

(3-3 ) 


— (B0-) 


*  3 
n(6  )*, 


a2o 

|  (8-3°)T  — i-(3°)(3-B°)  |  >  A]  3-3°  | 2 
36 


and 


0  T 

(3-bV{ 


(3  )  - 


— — 7(3°)]‘(3-30)  |  <  nM(6*)3 
33  > 


(2.2) 


Using  (2.1)  and  (2.2)  we  have  that  if  3  is  on  the  boundary  of  N  then 

6 

Qn(3)  >0^(3°)  +  n(fiVa-6*-M6*),  (2.3) 


where  the  last  term  in  (2.3)  can  be  made  positive  by  initially  choosing 
6  sufficiently  small.  Hence,  for  such  a  6,  Qn(3)  must  attain  a  minimum 

^  A  A 

at  some  3_  in  N  and  for  this  3_  we  must  have  3Q  (3  )/33  =  0.  The 
it  ^  n  n  n 

proof  can  now  be  completed  as  in  the  proof  of  Corollary  2.1  of  Klimko 


and  Nelson  (1978)  by  selecting  appropriate  sequences  {e^}  and  {6^}  tending 
to  zero. | | 


A  penalty  function  0  satisfying  the  general  conditions  A1-A3 


will  not  necessarily  be  useful  in  practice.  It  seems  that  additional 


constraints  have  to  be  imposed  on  the  functional  form  of  to  make  it 

A 

natural  to  choose  as  Bn  the  value  of  3  giving  the  smallest  relative 
minimum  of  Q^.  Such  properties  are  inherent  in  the  conditional  least 
squares  and  maximum  likelihood  type  penalty  function.  (For  e.g.  the 
conditional  least  squares  case  we  have  E{Q  (6^)}  <  E{Q  (3)}  for  all  6). 


The  condition  A3  may  not  always  be  easy  to  check  in  practice.  If 
is  almost  surely  three  times  continuously  differentiable  on  S,  then, 
using  the  mean  value  theorem,  an  obvious  sufficient  condition  for  A3  is 


the  existence  of  an  M>0  independent  of  6  such  that 

la.s. 


A3 ' :  1 im  sup  n 

n  “ 


-1 


M 


for  6  c  S  and  i,j,k  =  l,...,r. 

When  it  comes  to  asymptotic  normality  it  is  essentially  sufficient 
to  prove  asymptotic  normality  of  9Qn(3^)/88. 

Theorem  2.2:  Assume  that  the  conditions  of  Theorem  2.1  are  fulfilled 
and  that  in  addition  we  have  that  as  n  +  ® 


Bl: 


for  i, j  =  1, . . . ,r, 
matrix,  and 


-1 

n  &6.3B 


<B°) 


a.s. 


V.  . 
ij 


where  V  =  (V^)  is  a  strictly  positive  definite 


B2:  nh  ^  (P°)  $  N( 0,W) 

where  N(0,W)  is  used  to  denote  a  multivariate  normal  distribution  with 

A 

a  zero  mean  vector  and  covariance  matrix  W.  Let  (B  }  be  the  estimators 

n 

obtained  in  Theorem  2.1.  Then 


n*(f3n-e0)  &  W(0,V-1WV-1)  (2.4) 

The  proof  is  identical  to  the  proof  of  Theorem  2.2  of  Klimko 
and  Nelson  (1978)  and  is  therefore  omitted. 

In  the  remaining  part  of  this  paper  {X^}  will  be  assumed  to  be 
strictly  stationary  and  ergodic.  In  addition  second  moments  of  {Xt) 
will  always  be  assumed  to  exist, so  that  (Xt)  is  second  order  stationary 
as  well.  The  task  of  finding  nonlinear  models  satisfying  these 
assumptions  is  far  from  trivial  (cf.  Tjdstheim  1984a,  Sec.  5). 

3.  Conditional  Least  Squares 

We  denote  by  F  the  sub  a-field  of  F  generated  by  {Xg,  sst},  and 
we  will  use  the  notation  Xt|t_j  ■  X( |  j(B)  for  the  conditional 


expectation  j).  We  will  often  omit  3  for  notational  convenience 

Since  second  moments  of  {X^}  are  assumed  to  exist,  Xt|t  ^  is  the  optimal 

~  2 

one-step  least  squares  predictor  of  X^  and  ECX^-X^ | ^  j)  is  finite. 

In  the  case  where  {Xt>  is  defined  for  t£l  only  (this  will  be 

referred  to  as  the  one  sided  case),  X^|t  ^  will  in  general  depend 

explicitly  on  t  and  therefore  X^|t  ^  do  not  define  a  stationary  process. 

If  the  index  set  I  of  (Xt>  tel)  comprises  all  the  integers,  then 

X^|t  j  is  stationary,  but  in  general  X^jt  ^  will  depend  on  X^'s  not 

included  in  the  set  of  observations  (X, ,...,X  ).  To  avoid  these 

1  n' 

X  X 

problems  we  replace  F  ^  by  F^  ^(m),  which  is  the  a-field  generated  by 

{Xs,  t-m  _<  s  _<  t-1),  and  let  X^|t  j  =  E{Xt|F^  ^m)}.  Here  m  is  an 

integer  at  our  disposal,  and  we  must  have  t>m+l  in  the  one  sided  case. 

We  will  use  the  penalty  function 

QnCB)  =  1  {X  -  X  ,  (3)}2  (3.1) 

^  t=m+l  z  1 

and  the  conditional  least  squares  estimates  will  be  obtained  by 
minimizing  this  function.  In  the  important  special  case  where  X^|t  ^ 
only  depends  on  {Xs>  t-p  <_  s  <  t-1),  i.e.  {X^.}  is  a  nonlinear  auto- 
regressive  process  of  order  p,  we  can  take  m=p  and  we  have  E(Xt|Ft  j) 

*  ECX^lF^  j(m)},  where  t  2  m+1  in  the  one  sided  case.  It  should  be 
noted  that  we  may  loose  parameters  of  interest  in  the  conditioning 
operation  with  respect  to  F  j  or  ^^(m),  but  as  will  be  shown  in  the 
examples  of  the  next  section,  there  are  ways  of  getting  around  this 
problem. 

The  theorems  in  this  section  are  essentially  obtained  by 
reformulating  and  extending  the  arguments  of  Klimko  and  Nelson  (1978) 
to  the  multivariate  case. 
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Theorem  3.1:  Assume  that  {Xt>  is  a  d-dimensional  strictly  stationary 

ergodic  process  with  E  (|Xt|  <  00  and  such  that  xt|t_i(3)  =  ^Xt  ^t-l^m^ 

is  almost  surely  three  times  continuously  differentiable  in  an  open  set 

B  containing  6°.  Moreover,  suppose  that 

3>LU  ,  n  2)  a2xtk-l  o  2 

Cl:  E-l  — El--1-  (8°)  ■<  03  and  E<  — — ■ — -  (3  )  ■<  00 

3Bi  36.38. 

for  i, j  =  1, . . . ,r. 

C2 :  The  vectors  3xt | t_1 (6°)/36i,  i  =  l,...,r,  are  linearly 
independent  in  the  sense  that  if  a^.  ...,ar  are  arbitrary 


real  numbers  such  that  E1!  \  a.  — tl-t  1  (8°)  =0,  then 

i-1 

a,  =  a  =. . .=  a  =  0.  k 
12  r 

iik 

C3:  For  6  e  B,  there  exists  functions  Gt  j CX^ , . . . ,Xt  j)  and 

Ht3k(xi" '  ‘  *  V  such  that 

</  X 


for  i, j ,k=l , . . . ,r. 


Then  there  exists  a  sequence  of  estimators  (6n)  minimizing  of  (3.1) 


such  that  the  conclusion  of  Theorem  2.1  holds. 
Proof:  Using  (3.1)  we  have 


I 


9 


30  n  ~  T  ax*li-  i  n 

^=-2L.,(v1w  -Hr  -  -2L..A 


(3.2) 


E{iPtC3°)}  =  E  E[{Xt-Xt,  jCe0)}1!  (m)1-  ^lt~1(B°)  =0  (3.3) 

L  L  i  -j  i  , 

Furthermore,  from  the  Schwarz  inequality  and  Cl  we  have  E{  |^(B^)  | }  <  °° 

and  using  the  ergodic  theorem  we  have  that  A1  of  Theorem  2.1  is  fulfilled. 


Second  order  derivatives  are  given  by 

2„  i  ,T  ^tlt-l 

^i^j"  ~  t=m+l  93i  3Bj  1  VXt|t-lJ  36.36. 


(3.4) 


Reasoning  exactly  as  above  we  have  that  the  expectation  of  the  last  term 
of  (3.4)  is  zero  for  6=6°.  Again  using  the  Schwarz  inequality  and  Cl 
it  follows  from  the  ergodic  theorem  that  as  n  -»- 


-1  9  ^n  rp0  a  s.  f9Xt|t-l  fp0,9Xt|t-l  ,n0^ 

n  3  2Et“^ -  (S  36 j  (B  )PV1  j  (3>5) 

for  i,j=l,...,r.  The  matrix  V  =  (V^)  in  (3.5)  is  by  definition 
non-negative  definite.  That  its  smallest  eigenvalue  is  larger  than 
zero  follows  from  C2,  and  this  in  turn  implies  that  A2  of  Theorem 
2.1  is  fulfilled.  Finally,  from  C3  we  have  that  the  ergodic  theorem 
and  the  mean  value  theorem  implies  A3  of  Theorem  2.1  | | 

Let  3Xt|tl/36  be  the  dxr  matrix  having  3Xt|t  36 ^ ,  i=l,...r, 


as  its  column  vectors.,.  We  denote  by  U=hV  the  r*r  matrix  defined  by 

u  =  E{3X&it~1(g°)  ~~agt~1(S°)}'  (3'6) 


We  will  next  use  Theorem  2.2  to  prove  asymptotic  normality.  The 
proof  depends  on  Billingsley's  (1961)  martingale  central  limit  theorem. 
We  then  need  to  condition  with  respect  to  an  increasing  sequence 

V 

of  a-fields  in  order  to  obtain  a  martingale,  and  since  (Ft(m)}  is 
not  increasing,  we  now  assume  the  existence  of  an  m  such  that  we  have 


(t  >.  ®+l  in  one  sided  case) 


ECXtlF?-l)a=S'  E(Xt|Ft-l(m)) 


(3.7) 


-6<'Vxt|t-P  ‘xt  -  xt|t-i>TlFJ-i> 


a's'  E«Vxt|t-i>  <xt-xt|t-i) 

where  we  have  used  ^  to  denote  the  d*d  conditional  prediction  error 

matrix  of  {Xt}.  The  relations  in  (3.7)  hold  trivially  for  nonlinear 
AR  processes. 

Theorem  3.2;  Assume  that  (3.7)  and  the  conditions  of  Theorem  3.1 


are  fulfilled.  In  addition  assume  that 

KlI-l  ,„o  -  3X-' 


Dl:  R  =  E 


(*  )  ft|t-i(n 


iliiice0)  f  <  - 


Let  (g  }  be  the  estimators  obtained  in  Theorem  3.1.  Then 
n 


n^g  -g°)  4  N(0,U“1RU"1). 


(3.8) 


Proof:  In  view  of  Theorem  3.1  and  (3.5)  we  only  have  to  verify 


condition  B2  of  Theorem  2.2.  This  will  be  done  using  a  Cramer -Wold 
type  argument.  Thus  let  y  , ...,yr  be  arbitrary  real  numbers.  Using 
the  definition  of  X  it  j  we  have 


e  y  y* 


(3.9) 


and  from  (3.2)  we  conclude  that  the  time  increments  of  \  y^Q^g  /Sg^ 

i=l 

are  strictly  stationary  ergodic  martingale  increments  with  respect 
to  {FX}.  It  follows  from  Billingsley  (1961)  that  n  y.30  (8^)/98. 


converges  in  law  to  a  normal  distribution  with  zero  mean,  and  thus 
n” i3Qn(g°)/3gi  has  a  multivariate  normal  distribution  as  its  limiting 
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distribution. 


0  X 

It  remains  to  evaluate  the  covariance  matrix.  Since  {3(^(8  )/36i»F^} 
is  a  martingale,  using  (3.2)  it  is  easy  to  verify  that 

,„o.  8IV  ,„o,l 


1  (6°)}  - 
j  7 


n  fax 


4JiE  [™r (e0)I B  [{: "t-Xt  1  ^ (eU)}J CXt_Xt  i t-1' Cl *U) 11 1  |FtA-iJ-i4fi(eU) 

and  by  combining  (2.4),  (3.5),  (3.6),  (3.7)  and  Dl  we  obtain  (3.8).|| 

For  a  large  class  of  time  series  models  (including  the  ordinary 

linear  AR  models)  the  condition  Dl  is  implied  by  the  condition  Cl 

of  Theorem  3.1,  and  hence  essentially  no  extra  condition  is  required 

to  ensure  asymptotic  normality. 

~  o  X 

Corollary  3.1  :  If  xt“xt|t  j(3  )  is  independent  of  F^  j,  then  Dl 
is  implied  by  Cl. 


O^T.rX  r  t  t-l,„0. 


Proof:  Under  the  stated  independence  assumption  we  have 

ft|t.l(6°)  -  E  (Xt-X«|t.j(80)}(Xt-*t|t.i(B0))T 

■»  *» 


(3.11) 


and  the  Schwarz  inequality  yields  the  conclusion. | | 


A  number  of  models  was  referred  to  in  the  Introduction.  In 
Tjdstheim  (1984a)  these  were  subdivided  into  three  main  categories: 
Models  motivated  by  nonlinear  differential  equations,  bilinear 
models  and  doubly  stochastic  models.  We  will  try  to  apply  the  results 
obtained  in  the  preceding  section  to  one  model  from  each  category, 
namely  an  exponential  autoregressive  model,  a  bilinear  model  and  a 
random  coefficient  autoregressive  series. 

For  notational  convenience  we  will  omit  the  superscript  0  for  the 


true  value  of  the  parameter  vector  in  this  section.  Moreover,  in 


all  of  the  following  (e^,  <t  <  00 }  will  denote  a  sequence  of 

independent  identically  distributed  (iid)  (possibly  vector)  random 

T 

variables  with  E(et)  =  0  and  E(etet)  =  G  <  °°. 

4.1  Exponential  autoregressive  models 

These  models  were  introduced  and  studied  by  Ozaki  (1980)  and 
Haggan  and  Ozaki  (1981) .  The  point  of  departure  is  an  ordinary 
scalar  autoregressive  model  of  order  p  (AR(p)),  where  the  ith  auto- 

2 

regressive  coefficient  a^  is  replaced  by  a^(t-l,X)  =  +  Tr^exp(-yXt  ^), 

where  i=l, . . ,  ,p  and  y  are  real  parameters  such  that  y  >  0.  This 

results  in  a  model 

P 

Xt  "Jj{V  VXp('YXt-l)}Xt-i  =  et  (4-15 

which  is  assumed  defined  for  t£p+l  with  Xj,...,X  being  initial  variables. 
Haggan  and  Ozaki  (1981)  have  considered  the  problem  of  numerical 

/V  A 

evaluation  of  ,  t r.  and  y  by  minimization  of  the  sum  of  squares 
in  in  n  ^ 

penalty  function  of  (3.1),  and  they  have  done  simulations.  However, 
we  are  no.t  aware  of  any  results  concerning  the  asymptotic  properties 
of  these  estimates. 

To  make  the  principles  involved  more  transparent  we  will  work  with 
the  first  order  model 

Xt  -  (V  +  tt  exp(-yX^_1)}Xt  l  =  et  (4.2) 

defined  for  t  £  2  with  X^  being  an  initial  variable. 

Theorem  4.1:  Let  {Xt}  be  defined  by  (4.2).  Assume  that  |<f>|  +  |tt|  <1, 
and  that  et  has  a  density  function  with  infinite  support  such  that 
E(e^)  <  <®.  Then  there  exists  a  unique  distribution  for  the  initial 
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variable  Xj  such  that  {Xt>  t  1}  is  strictly  stationary  and  ergodic. 

AAA 

Moreover,  there  exists  a  sequence  of  estimators  { C^n>7Tn»Ynl } 

minimizing  (as  described  in  the  conclusion  of  Theorem  2.1)  the  penalty 

~  ~  a.  s . 

function  of  (3.1)  and  such  that  [i|>  ,tt  ,y  J  [jp,ir,Y3  ,  and 

AAA 

^n,7n*YJ  *s  as^Ptotically  normal. 

Proof:  Our  independence  assumption  on  {e^}  implies  that  {X^,  Dl) 
is  a  Markov  process,  and  the  problem  of  existence  of  a  strictly 
stationary  and  ergodic  solution  to  the  difference  equation  (4.2) 
can  then  be  treated  using  Corollary  5.2  of  Tweedie  (1975). 

Since  e^  has  a  density  with  infinite  support  it  follows  that 
{X^}  is  (J)- irreducible  (cf.  Tweedie  1975)  with  (j>  being  Lebesgue 
measure.  Since  for  an  arbitrary  Borel  set  B  we  have 

P(x,B)  =  P  (Xt  e  B|Xt_1-x)  =  P  (et  e  B-a(x) •  x)  (4.3) 

2 

where  a(x)  *  ♦  it  exp(-yx  ),  and  since  the  function  a  is  continuous, 

it  follows  that  {P(x,-))  is  strongly  continuous.  Moreover,  it  is 
easily  seen  from  (4.2)  that 

Yx  "  E{(|Xt|  -  |Xtl|)|Xtl  =  x}  £  ( |a(x) |  -  1}  |x|  +  E|et|.  (4.4) 

2 

Here,  |a(x)|  £  |<|>|  +  |tt|  exp(-yx  )  £  |i|;|  +  | tt |  since  y  >_  0.  Let 
a  =  E(|et|)/(1-  |tj»|  -  | tt | ) .  Then  if  |ty|  +  | tt |  <  1,  there  exists  a 
c  >  0  such  that  Yx  £  -c  for  all  x  with  |x|  >  a.  Moreover,  Yx  is  bounded 
from  above  for  all  x  with  |x|  £  a.  It  follows  from  Corollary  5.2 
of  Tweedie  (1975)  that  there  exists  a  unique  invariant  initial 
distribution  for  Xj  such  that  {Xt>  t>l}  is  strictly  stationary  and 
ergodic. 

Since  we  have  a  nonlinear  AR(1)  process,  we  can  take  m=l  in 
Theorems  3.1  and  3.2.  The  conditions  stated  in  (3.7)  will  then  be 
trivially  fulfilled  and  we  have  for  t>2 
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Xt|t-1  =  E(Xt|FLl}  =  ^  +  *  expC-yxJ_1)}Xt_1. 


(4.5) 


Furthermore,  j  =  E(e^)  = -o  such  that  D1  of  Theorem  3.2  follows 

from  Cl  of  Theorem  3.1, and  it  is  sufficient  to  verify  Cl  -  C3. 


Since  any  moment  of  it  exp(-yX^  j)Xf  ^  exists,  it  follows  from 
)  and  the  stric 
From  (4.5)  we  have 


(4.2)  and  the  strict  stationarity  of  (Xt>  that  E(e^)  <  °°  implies  E(X^)  < 


3X 


t|t-l 

"§\Jj 


3X 


=  X 


t-1’ 


3X 


tlt-1  ,  ,  „2  ^vk+1 

“* - =  (-2)  tt  cxp(-yXt_1)Xt_1 

3Xk+1 


3y 


(4.6) 


exp(-Yx2_1)Xt_1,-^ll  =  (~2)k  exp(-yx2_1)Xk:11 

for  k=l,...  while  the  other  derivatives  are  zero.  It  is  easily  seen 

that  E(X^)  <  °°  implies  that  Cl  is  satisfied.  Since  jip|  +  |ir|  <  1 ,  we 

have  that  |X  -X.i.  ,1  <  |x  I  +  |X^  ,  I  and  that  the  above  derivatives 
1  t  1 1 1-1  —  1  t  t-1 ' 

are  bounded  by  |  Xt  ^  | ,  2k|xt  j|k+*,  lxt  jl  an<*  2k  |  Xt  j|k+1,  respectively. 
Successive  applications  of  the  Schwarz  inequality  and  use  of  E(X^)  <  °° 
yield  C3. 

Let  a^,  a^  and  a^  be  three  arbitrary  real  numbers.  Then 


cK 


3  ip 


3( 


lliii 

3  7T 


ax 


t]t-i 

3  Y 


implies 


Xt-l[al  +  exP("YXt_iHa2Xt_1-2a3iT}J  a=s<  0, 


(4.7) 


(4.8) 


2  2 

and  since  E(X^)  E(et  )  >  0,  it  follows  that  ai=a2=a3=0,  Hence 
C2  holds  and  the  proof  is  completed.  || 

The  infinite  support  assumption  on  {e^}  can  be  relaxed. 
Moreover,  it  is  not  absolutely  critical  that  the  model  (4.2)  is 


initiated  with  X^  in  its  stationary  invariant  distribution.  The 
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critical  fact  is  the  existence  of  such  a  distribution  (cf.  Klimko 
and  Nelson  1978,  Sec.  4). 

The  general  P-th  order  model  can  be  transformed  to  a  first 
order  vector  autoregressive  model,  and  essentially  the  same  technique 
can  be  used.  In  this  case,  the  condition  |<J>|  +  |ir)  <  1  can  be 
replaced  by  the  condition  that  there  is  a  matrix  norm  | • |  such  that 
|  'P  |  +  |  II  |  <  1,  where 


V  . 

<p 

ji  , 

IT 

p-1 

p 

IT  = 

p-i 

0 

Vi 

0 

Vi 

0 

with  J  and  np  l  =  Dtj.---, Vl3  *  and  where  Vl 

is  the  identity  matrix  of  order  p-1. 

It  is  interesting  to  consider  the  special  case  of  an  ordinary 

AR(p)  process.  Then  y=0  and  ^  +  ^  =  a^-  1116  ergodicity  and  stationarity 

condition  from  Tweedie  (1975)  reduces  to  requiring  that  the  zeros  of 

z^-£  a.z^  *are  inside  the  unit  circle  of  the  complex  z-plane.  Since 
i=l  1 

only  first  order  derivatives  of  X^|t  ^  are  non-zero,  and  since  f^|t  ^ 

2  2 

=  ECe^)  =  a  and  i=l,...,p,  are  linearly  independent,  we  have 

2  2 

that  C1-C3  and  D1  amount  to  requiring  E(Xt)  <  °°,  or  E(e^)  <  «>,  so  that 
Theorems  3.1  and  3.2  reduce  to  the  classical  consistency  and  central 
limit  theorem  in  this  situation. 

A  related  class  of  models  is  the  threshold  autoregressive  processes 
(Tong  and  Lim  1980).  Unfortunately  we  have  not  been  able  to  establish 
the  existence  of  a  stationary  invariant  initial  distribution  for 
these  processes.  The  transition  probability  P(x,')  is  not  in  general 
strongly  continuous  (nor  is  it  weakly  continuous) ,  and  this  makes  it 
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difficult  to  apply  Tweedie’s  (1975)  criterion.  We  will  treat  the 
threshold  processes  in  Tjdstheim  (1984b),  however. 

Another  class  of  related  processes  is  studied  by  Aase  (1984) 
(see  also  Jones  1978).  In  particular  Aase  looks  at  models  of  type 


Xt  '  9f(Xt-l)  =  S 

The  parameter  0  is  to  be  estimated.  Clearly 


Xt|t-1  -  6f<Xt-l>  and 


't|t-l 


"  f(Xt-l} 


(4.10) 


(4.11) 


while  higher  order  derivatives  are  zero.  Using  the  method  of  the 

preceding  proof  it  is  not  difficult  to  show  that  if  e^  has  a  density 

2 

with  an  infinite  support  and  E(et)  <  00  ,  if  f  is  continuous,  non-zero 

almost  everywhere,  and  there  is  a  constant  c  such  that  |f(x)|  <_  c|x|, 
.  /v  n  n  _ 

and  if  |6|  <  c  ,  then  the  estimate  0  =  £  X  f(X  i)/!  t  (X,.  .)  is 

n  t=2  Z  t-1  t=2  t_1 

strongly  consistent  and  asymptotically  normal.  Some  other  examples 
are  discussed  by  Aase  who  uses  densities  with  bounded  support  and 
recursively  defined  estimates. 

4.2  Random  coefficient  autoregressive  (RCA)  models. 


These  are  defined  by  allowing  random  additive  pertubations  of 
the  AR  coefficients  of  ordinary  AR  models.  Thus  a  d-dimensional  RCA  model 


of  order  p  is  defined  by 


xt  -J,(ai '  bti)xt-i  ■  % 


(4.12) 


for  -<xxt<°°.  Here,  a^,  i=l,...,p,  are  deterministic  d  *  d  matrices, 

whereas  (b^ (p) )  =  {  [b^, . . .  ,btp] }  defines  a  d  x  pd  zero-mean  matrix 

process  with  the  bt(p)'s  being  iid  and  independent  of  (et).  Second 

moments  of  both  (bt(p)}  and  (et>  will  be  assumed  to  exist,  and  the 

T 

covariance  matrix  G  =  E(etet)  will  be  assumed  to  be  nonsingular.  We 


L  ^ 
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6  b 

denote  by  F  and  F^  the  o-fields  generated  by  (es>  sst}  and 

{b  (p) ,  s£t } , * 3nd  we  assume  that  conditions  are  fulfilled  so  that  an 

0  Jj 

ergodic,  strictly  and  second  order  stationary  F^  v  F^-measurable 
solution  of  (4.12)  exists.  Various  conditions  for  this  in  terms 
of  the  matrices  a^,  i=l,...,p,  and  the  second  moments  of  (b^Cp)} 
are  given  in  Nicholls  and  Quinn  (1982,  Ch.  2). 

A  least  squares  estimation  theory  of  RCA  processes  is  developed 
in  Chs.  3  and  7  of  Nicholls  and  Quinn  (1982)  using  regression 
analysis  and  conditioning  with  respect  to  F®  v  F^.  We  will  demonstrate 
that  the  general  framework  of  estimation  developed  in  this  paper  can 
be  used  to  obtain  their  results,  i.e.  their  theorems  3.1,  3.2,  7.1 
and  7.2. 

It  is  convenient  to  rewrite  (4.12)  slightly.  Let 

T  r  T  T  1  2 

bt  =  [vec  (btl) , . . . ,vec  (btp)J ,  where  vec(bti)  is  the  d  -dimensional 

column  vector  obtained  by  stacking  the  columns  of  b^  one  on  top  of 

T  f  T  T  * 

the  other  in  order  from  left  to  right,  and  let  a  =  I  vec  (a  ),..,yec  (a  ) 

l  ^  < 

Moreover,  let  Cj»c2  am*  c3  matrices  such  that  the  product  cjc2c3  *-s 

T 

well  defined.  Applying  the  formula  vec^j^Cj)  =  (c^  ®  c^vecfCj), 

where  ®  denotes  tensor  product,  we  obtain  using  vectorization  on  (4.12) 

Xt  -  F(t-l,X)  (a  +  bt)  =  et.  (4.13) 

2 

Here  F(t-l,X)  is  the  d  x  pd  matrix  function  given  by 

F(t-1 , X)  =  [\Tt  l  ®  Id  ,...,xj  ®  Id]  .  (4.14) 

We  describe  the  processes  (bt)  and  { e^ }  by  their  covariance  matrices 

/and  G,  respectively.  This  amounts  to  a  complete  description  in  the 

2 

Gaussian  case.  The  parameters  of  interest  are  then  the  pd  elements 

2  2 

of  the  vector  a  and  the  pd  (pd  +  1 ) / 2  +  d(d  +  l)/2  distinct  elements 


of  the  symmetric  matrices  A  and  G. 

Theorem  4.2:  Under  the  stated  assumptions  on  {X^}  there  exists  a 

strongly  consistent  sequence  of  estimates  {a^}  for  a.  If  we  assume 

4 

in  addition  that  E{Xti>  <  °°  t  i=l,...,d,  where  Xti  is  the  ith 
component  of  Xt,  and  that  et  cannot  take  on  only  two  values  almost 
surely,  then  there  exists  strongly  consistent  sequences  of  estimates 

A  A 

{A  }  and  {G  }  for  A  and  G. 
n  n 

Proof :  Note  that  (3.7)  is  fulfilled  if  we  take  m  =  p,  so  that  we 
X  X 

may  condition  on  F  x  instead  of  F  j(m)  both  in  this  proof  and  in 
the  proof  of  asymptotic  normality  in  the  next  theorem. 

Since  {X^}  is  generated  by  {e^}  and  {b^ }  we  have  F*  c  F^  v  F®, 


and  due  to  our  independence  and  zero  mean  assumptions 

e 


E{F(t-l,X)bt|Fj_x}  =  F(t-l,X)E{E(bt|F^1  v  l^_1>  =  0.  (4.15) 


It  follows  iimnediately  that  Xt|t  1  =  F(t-l,X)a.  The  equation 
3Qn/^ec(a)  =  0  with  Q^  as  in  (3.1)  is  linear  in  a,  and  an  explicit 


and  unique  solution  a^  can  be  found  (cf.  Quinn  and  Nicholls  1982,  p.126) 


Using  vector  notation  in  derivatives  we  have 

„T. 


3Ct|t  j/3vec(a^)  =  X  (t-i)  ®  1^  while  the  higher  order  derivatives 
with  respect  to  a^,  i*l,...,p,  are  zero.  Since  by  assumption  {Xt) 
has  second  moments,  it  follows  at  once  that  Cl  and  C3  of  Theorem  3.1 
are  fulfilled.  The  linear  independence  condition  C2,  on  the  other 
hand,  follows  from  the  nonsingularity  of  the  matrix  G  (cf.  Nicholls 
and  Quinn  1982,  proof  of  Th.  2.2,  p.  24),  and  the  first  part  is  proved. 

To  obtain  estimates  of  A  and  G  we  use  the  conditional  least 


squares  principle  with  X  replaced  by 


Vt  =  ^VXt|t-l(a5}{VXt|t-l(a)}T  “(FCt-l.JOht  +  et)  FCt-l,X)bt+et)T  (4.16) 


Using  the  same  reasoning  as  when  deriving  (4.15)  we  have 

ft|t-l=  vt | t-1  =  E(vtiFt-l3  =  p(t-1‘X)  A  FT(t-l.X)  +  G.  (4.17) 

2  2 

We  are  only  interested  in  the  pd  (pd  +  l)/2  +  d(d  +  l)/2  distinct 
elements  of  A  and  G,  and  we  therefore  use  the  vech  operation  on  (4.17).  We 
refer  to  Nicholls  and  Quinn(1982,  Ch.  1)  for  a  description  of  this 
operation.  We  have 

vech(vt|t  l)  =  H{F(t-l,X)  ®  F(t-1,X)}KT  vech( A)+vech(G)  (4.18) 

where  H  and  K  are  constant  matrices  independent  of  A  and  G. 

We  now  consider  conditional  least  squares  estimates  obtained 
by  minimizing 

n  ~  _ 

Q^(v)  =  l  |vech(vt)  -  vech(vt jtl) |  (4.19) 

where  in  practive  v  has  to  be  replaced  by 

vt  =  {Xt"Xt|t-l(an)}{Xt“Xt|t-l(an)}T-  FTOm  (4‘18)  H  follows  that 
3  vech (v  |  . )  3vech(v  |  ) 

- =  H(F(t-l,X)  ®  F(t-1  ,X) }K  and  - =  t  (4.20) 

3  vech  (A)  3  vech  (G)  tUd-ij// 

while  higher  order  derivatives  with  respect  to  the  parameters  are 
zero.  The  fact  that  et  cannot  take  on  only  two  values  almost  surely 
now  ensures  (cf.  Nicholls  and  Quinn  1982,  p.45  and  their  Lemma  3.1) 
that  C2  of  Theorem  3.1  holds,  while  (4.20)  and  the  existence  of  fourth 
order  moments  of  {X^}  means  that  Cl  and  C3  are  fulfilled,  so  that  the 

A  A 

estimation  of  Ar  and  Gn  obtained  by  minimizing  (4.19)  are  strongly  con- 

A. 

sistent.  The  equivalence  of  using  v  and  v^  as  n  •+•  »  is  proved  in 
Nicholls  and  Quinn  (1982,  Appendix  7.1),  where  explicit  expressions 

A  A 

for  An  and  G  are  also  given.  | | 

For  genuine  RCA  processes  with  A  ^  0,  the  conditional  prediction 

error  matrix  given  in  (4.17)  will  be  stochastic  and  condition 
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D1  of  Theorem  3.2  in  this  case  implies  that  some  extra  conditions 
are  needed  on  (xt>  to  obtain  asymptotic  normality. 

A 

Theorem  4.3:  The  estimate  an  of  Theorem  4.2  is  asymptotically  normal 

4 

if  we  assume  in  addition  that  E(xt^)  <  00  .  i=l,...,d.  Similarly 

/N.  A 

A^  and  are  asymptotically  normal  if  in  addition  to  the  conditions 

g 

of  Theorem  4.2  we  assume  E(X^)  <  00  ,  i=l,...,d. 

Proof:  Using  (4.17)  we  have  that  the  matrix  R  in  the  condition  D1 
is  given  by 

R  =  E[FT(t-l,X)(F(t-l,X)  A  FT(t-l,X)  ♦  G}F(t-l,X)]  (4.21) 

and  it  follows  from  (4.14)  and  Theorem  3.2  that  existence  of  4th 

A 

order  moments  is  sufficient  to  guarantee  asymptotic  normality  of  a^. 

The  matrix  U  defined  in  (3.7)  is  given  by  U  =  E{FT(t-l,X)F(t-l,X)} 
from  which  the  limiting  covariance  matrix  of  n  (vec  (an)  -  vec  (a)} 
can  be  obtained  from  (3.8)  and  (4.21).  The  covariance  matrix  is 
given  in  a  slightly  different  form  in  (Nicholls  and  Quinn  1982,  p.  127), 
but  the  connection  can  be  established  with  simple  tensor  product 
operations. 

When  it  comes  to  the  parameter  vectors  y  =  vech(A)  ^nd 
g  =  vech  (G), Theorem  3.2  can  again  be  used  if  X^.  is  replaced  by 
w^  =  vech  (v^)  with  v^  defined  in  (4.16).  Again  it  can  be  shown  that 
the  difference  between  using  v^  and  v^.  has  no  influence  on  asymptotic 
results  (cf.  Nicholls  and  Quinn  1982,  Appendix  7.1). 

To  ease  the  comparison  with  Nicholls  and  Quinn  (1982)  we  use 
the  notation  =  H{F(t-l,X)  ®  F(t-1,X)}KT.  Using  (4.20)  it  is  not 
difficult  to  show  that  corresponding  to  the  matrix  R  of  the  condition  D1 
we  obtain 


t 
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RW  =  E 


ld(d-l)/2 


f w  fzT  i 

*t|t-l  LZt  1 


d(d-l)/2- 


(4.22) 


where  f”|tl  *  E{(wt  -  wt | t_j) (wt-wt | t_i)T! Ft-1^'  and  where 

wt|t_j  is  given  in  (4.18).  From  the  definition  of  wt  and  2^  it  is  seen 
that  the  condition  D1  of  Theorem  3.2  now  requires  the  existence  of 
8th  order  moments  of  {X^.},  which  is  the  condition  given  by  Nicholls 
and  Quinn  (1982)  in  their  theorems  3.2  and  7.2. 

From  (4.20)  it  follows  that  corresponding  to  (3.7)  we  have 


ft 

U  =  E 


a(d-l)/2 


(4.23) 


and  we  can  now  easily  recover  the  formulae  for  asymptotic  covariance 

A  A 

for  (Yn.&n)  given  in  Nicholls  and  Quinn  (1982,  p.  132)  from  the  general 
formula  (3.8) . | | 


This  reasoning  only  gives  the  asymptotic  marginal  distributions 

A  A 

of  an  and  (Yn»8n)»  respectively.  To  prove  joint  asymptotic  normality 

AAA 

of  (an,Yn,gn)  we  can  consider  the  penalty  function 


(4.24) 


and  apply  basically  the  same  technique  (cf.  Theorem  2.2). 


4.3  Bilinear  models 


This  class  of  models  has  received  considerable  attention  recently. 
We  refer  to  Granger  and  Andersen  (1978),  Subba  Rao  (1981)  and  Baskara 


Rao  et  al  (1983)  and  references  therein.  A  scalar  bilinear  time 
series  {X^}  of  type  BL(p,q,m,k)  is  defined  for  all  integers  t  by  the 
difference  equation 
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r  ^  m  k 

X. -a.A  -  V  a.X  •  =  e  +  Y  c.e  •  +  £  Y  b..X  .e  . 
t  0  1  t-1  1  1  t"1  j=1  iJ  t-i  t-j 


(4.25) 


Most  of  the  work  in  the  literature  has  been  concerned  with  finding 
stationary  F^-measurable  solutions  to  (4.25)  and  to  the  evaluation 
of  parameters  from  the  data.  We  are  not  aware  of  a  theory  of  statistical 
inference  for  these  models,  except  in  rather  special  cases  (cf. 

Hall  and  Heyde  1980,  Sec.  6.5).  Using  our  general  framework  we 
have  only  been  able  to  treat  some  special  bilinear  series,  and  we 
will  point  out  the  reason  for  failure  in  the  general  case. 

Guegan  (1983)  examines  conditions  for  stationarity  for  the 
bilinear  model 


xt-aXtl  =  cet  +  bXtlet  +  d(e^  -  1)  (4.26) 

where  {e^}  is  a  zero-mean  Gaussian  white  noise  with  variance  1.  Guegan 

shows  that  there  is  an  ergodic,  strictly  and  second  order  stationary 

2 

solution  of  (4.26)  if  E{(a+bet)  }  <  1.  We  can  choose  m=l  in  (3.7) 

~  2  2 
and  we  have  X  =  aX^  ^  and  f  i  =  (bX^+c)  +  2d  .  It  follows 

'  "  1  "n  n  _  -i 

directly  from  Theorems  3.1  and  3.2  that  a  =  (£  XX  )(£  K.  7)  is  a 

n  t_2  t  t-i  t_2  t-x 

strongly  consistent  estimate  for  a,  and  that  if  we  assume  E(Xt)  <  «, 
it  is  also  asymptotically  normal.  Estimates  of  b,c,  and  d  can  be 

2  2 

treated  by  using  Theorems  3.1  and  3.2  on  vt  =  (bX^  ^  +  ce^  +  d(et-l)}  . 

The  model  (4.26)  has  a  very  special  structure,  since  no  "past" 
e^'s  are  allowed.  This  guarantees  that  {X^}  is  a  Markov  process. 

We  refer  to  Guegan  (1983)  for  some  generalizations. 

The  difficulties  in  the  general  case  is  illustrated  by 
conditioning  on  F  ^  in  (4.25).  We  obtain 
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r  M  m  k 

Xt|t-1  ■  *0  f=lVt-iE(et-)"t-l>  (4.27) 

X 

The  conditional  expectations  Efe^  ,-J  j)  will  in  general  depend 
nonlinearly  on  the  parameters  and  on  {Xg,s  <_  t-l}  ,  and  thus  the 
derivatives  with  respect  to  the  parameters  will  in  general  be  infinite 
expansions  in  terms  of  {Xg,  s  <_  t-l}  as  well.  The  conditions 
Cl,  C3  and  D1  essentially  require  mean  square  convergence  of  such 
expressions  and  are  thus  intimately  connected  with  the  invertibility 
problem  of  bilinear  models;  i.e.  the  problem  of  expressing  e  in  terms 
of  a  properly  convergent  nonlinear  series  of  past  Xt's.  This  problem 
seems  very  complicated  (cf.  Granger  and  Andersen  1978,  Ch.  8)  and 
until  more  progress  is  made,  it  appears  to  be  difficult  to  make 
substantial  headway  in  conditional  least  squares  estimation  of  bilinear 
series  using  the  present  framework. 

5.  A  maximum  likelihood  type  penalty  function. 

In  all  of  the  following  it  will  be  assumed  that  the  conditional 
prediction  error  matrix  | ^  ^  is  nonsingular  and  that  there  exists 
an  m  such  that  (3.7)  holds. 

Corresponding  to  weighted  least  squares  estimation  we  introduce 
the  conditional  weighted  sum  of  squares  penalty  function 
0  ?  ...  ~  T-l 


L»  •.L.'vw  £tit-i(vxtit-i,4 


(5.1) 


We  would  still  like  to  base  our  reasoning  on  the  general  Theorems  2.1 

0  Y 

and  2.2.  It  was  essential  in  Section  3  that  {3(^(6  J/SB^.F^}  was  a 
zero-mean  martingale.  We  have  the  following  result,  where  Tr  and  det 
are  abbreviations  for  trace  and  determinant. 

Proposition  5.1:  Let  be  as  defined  in  (5.1).  Then 
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E{_1  (so)|Fx_i}  .  _Tr  ,^l(s-)ft|t.l(!V 


t|t-l,„0. 


(5.2) 


—  in  [det  {ft|t  ,(6°)}] 
8$i  ' 


Proof :  We  have 


‘tt-l 


,T>1  "t  t-l  ,-l 


7F7  -  -2(v\it-p  -  (vxt|t-p  Mt-i 

1  1  (5.3) 


ft  t-l(VXt|t-l> 


Using  the  definition  of  X^|t  ^  and  standard  rules  of  the  trace 

„0 

operation  we  have  for  0=6 


E%7  'FtX-i>-  'Et(V*t|t-i)Tft|t-i  “SB”  ft|t-l(xt'\|t-l)lFt-l) 


=-Tr[E{(xt-Xt|t_1)(Xt-Xt|t_1)T|F*_1}f-|t  l— f"],.,] 


-H  ^ 


(5.4) 


The  last  expression  can  be  written  as  -3/30^{Tr(ln  f^j^  j)} 

However,  for  a  general  symmetric  matrix  A  we  have  Tr(lnA)  =  lnfdet  A) 
and  (5.2)  follows,  j | 

This  proposition  shows  that  {a^}  is  not  a  martingale  difference 

X 

sequence  with  respect  to  {F  },  but  if  we  introduce  an  adjustment  term 
corresponding  to  (5.2)  we  obtain  a  penalty  function 


L"  ;i.itlntdet(fti‘-i))  *  (v*t i t-pTft| t-i (xt_*t i t-p ] e  ♦»  (s-! 


which  has  the  required  martingale  property. 

If  {Xj  is  a  conditional  Gaussian  process,  then  L  coincides  with 
t  n 

the  log  likelihood  function  except  for  a  multiplicative  constant. 
However,  in  this  paper  we  will  not  restrict  ourselves  to  Gaussian 
processes  and  a  likelihood  interpretation,  but  rather  view  as  a 
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general  penalty  function  which,  since  it  has  the  martingale  property 
for  a  general  {X^ } ,  can  be  subjected  to  the  kind  of  analysis  described 
in  Sections  2  and  3. 

The  analysis  of  Lr  will  differ  in  an  essential  way  from  that 
based  on  conditional  least  squares  only  in  the  cases  where  f^t  ^ 

is  a  genuine  stochastic  process;  i.e.  when  j  is  not  independent 

x 

of  .  For  the  examples  treated  in  detail  in  Section  4  this  is  the 
case  only  for  the  RCA  processes.  More  general  state  space  models 
of  this  type  will  be  treated  in  Tj^stheim  (1984b).  As  will  be  seen, 
using  Ln  it  is  sometimes  possible  to  relax  moment  conditions  on  {X^}. 

5.1  Consistency 

We  denote  by  s  the  number  of  components  of  the  parameter  vector 
3  appearing  in  .  Due  to  the  presence  of  f^|t  j  in  Ln>  in 

general  s  >  r  with  r  defined  as  in  Theorem  3.1. 

Theorem  5.1:  Assume  that  {X^}  is  a  d-dimensional  strictly  stationary 

and  ergodic  process  with  E( | |  )  <  ®,  and  that  Xtjt  1(B)  and  f^jt  1(B) 

are  almost  surely  three  times  continuously  differentiable  in  an  open  set 

B  containing  3°.  Moreover,  if  <J>  is  defined  by  (5.5),  assume  that 
(  ^  ^ 


’  d*  M  f  92(J> 

El:  E  (P°)  |  <  -  and  E  | - —  (6°) |  < 

9Bi  33.36. 

i  1 


for  i,j=l,...,s,  and  where  expressions  for  these  derivatives 
are  given  in  (5.8)  and  (5.9) 

E2:  For  arbitrary  real  numbers  aj,...,ag  such  that  for  3=3® 

*2  2 
s  3x  s 

E<  4-11  vsjpi  >  ♦  =c  4-i*4-il  ,air-lvec(ft|t-i))  3  -  °-  <5-’ 

1  l  1  1  i=l  36.  1 


then  we  have  a-=a  =. . .=a  =0, 
12  s 
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i -5V 

E3:  For  8  e  B,  there  exists  a  function  HtJ  (Xj,...,X  )  such  that 

(8)  <  H*jk  and  E(Hjjk)  <  ~ 

for  i, j ,k  =  1, . . . ,s. 

A 

Then  there  exists  a  sequence  of  estimators  {8^}  minimizing 
of  (5.5)  such  that  the  conclusion  of  Theorem  2.1  holds. 

Proof :  Due  to  stationarity  and  ergodicity  and  the  first  part  of  El, 
we  have  n-^8Ln(8°)/38i  a4-s‘  E{3$t(B°)/9B^}  as  n  -*■  <*>.  However, 
because  of  the  martingale  increment  property  just  demonstrated  for 

{3<J>t(6°)/36i}  we  have  EO^B0)/^)  =  eCeO^CbVsbJfJ^}]  =  0 

and  A1  of  Theorem  2.1  follows.  Similarly,  A3  of  that  theorem  follows 
from  E3  and  the  ergodic  theorem. 


Using  the  last  part  of  El  and  the  eTgodic  theorem  we  have 
3^L  r  3^ 

"'1  sioir  <E°>V'  E[E(5Sp57  J  vij  <5-7> 

I 

It  remains  to  show  that  E2  implies  that  the  matrix  V'  =  (V^)  is 
positive  definite. 


For  this  purpose  we  will  give  explicit  expressions  for  3^/38^^ 
2 

and  3  <t>t/3B^3Bj,  since  this  will  be  useful  also  when  checking  the 
conditions  El  and  E3.  From  the  definition  of  <J>t  in  (5.5)  and  from 


(5.4)  we  have 


=  Tr  f' 


t!  t-i 


Lt|t-i 

38. 


t  1 1  (XX  } 

38i  t|t-llV  t|t-lJ 


,Y  x  *  V1  t|t-l  f-l 

^Xt“Xtjt-P  ft|t-l  38.  ft| 


t|t-l  {Xt 


Xtlt-P 


(5.8) 


and  similarly 
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,-l 

‘tlt-1  9B.aS. 

i  -,1 


9  4-1 


P-1  °At  t-1  C-1  ort  t-1 

t|t-l  88.  1 1 1-1  ~ 5sT 


rt|t-i  ~ 5b~  j 
9xtjt-i  _-i  9xt I t-i 


-  Mt-i  (Vxt|t-i) ♦  2  -i^  ft|t-i  ~W. 

~  x 

.  -,9xtjt-i  *-i  3ftit-i  ,,-1  ,v  : 

2  d&l  ft|t-l  96~  ft 1 1-1  (VXt|t-l3 

~T 

.  fv  v  ,T^-1  9ft I t-1  *-l  9ft|t-l  *-l  : 


(xt"xt|t-lJ  ft|t-l  ~inr  ft I t-l  sifT  ft I t-1  fxt"xt|t-l5 

a2f 

-  (X.  -  x  .  y  flu  ,  flu  ,  (X  -x  ,  ,) 


^xt  "  Xt|t-P  ft]t-i  “SBritr-  ft|t-i  ^xt"xt|t-i^  ^5*9^ 

X 

Since,  for  a  Ftl -measurable  d  x  d  matrix  function  C(t-l,X), 
we  have  for  8  =  8° 

=  TrtB((Xt-'xt|t.I)(Xt.;t|t.1)T|FtX.1}f;]t_1CCt.I,X)] 

=  Tr{C(t-l,X)},  (5.10) 

it  is  easily  verified  that 


yt  ,  tX 


la8i38j 


IFX  -  Tr If"1  9ftlt-l  f-l  9ftlt-l 

|Ft-l  "  T  ft|  t-1  ~ 3B~  ft|t-l  "3BT 


,  .  9Xt|t-l  -1  ‘"'tlt-1 

+  2  96  ft|t-l  9B."~ 

1  1  j 

However,  using  standard  rules  about  tensor  products  and  trace 


(5.11) 


operations  we  have  for  8  =8 
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-Up  -Isf1  >  - 

■  *  ft1t-i>vec(-§lti>  <s-12> 

1  3 

Thus  if  a1>...,as  are  arbitrary  real  numbers,  then  it  follows  from 

(5.11)  and  (5.12)  that 
2 

s  s  3  <j>  Y  ,  s  3X*u  , 

Jj  |_aaia j  E(EtW73B~  “  2E  Jj  ai  SSj"  I 


*E  Kft-i  ® 

i  1  1=1 


1 1  t-lN  1 2  .  _ 

WT-  ^  I  -  0 
1 


(5.13) 


Hence  the  matrix  V  defined  in  (5.7)  is  non-negative  definite,  and 
due  to  the  positive  definiteness  of  f^|t  ^  it  now  follows  from  (5.13) 
and  E2  that  V  is  in  fact  positive  definite  and  the  theorem  is  proved. || 

Since  some  of  the  components  of  8  =  [8j,...,8s]  may  be  missing 
from  either  1 1 j  or  ft|t-l  ^cf’  the  RCA  case  in  Section  4.2),  the 
condition  E2  essentially  requires  the  linear  independence  of  non-zero 

terms  of  both  3xt|t-l/3®i  and  3ft |t-l^i*  7,16  wei8htin«  with  ft|t-l 

is  necessary  to  ensure  existence  of  second  moments. 

The  verification  of  El  and  E3  must  proceed  from  (5.8)  and  (5.9), 
where  for  8=8°  the  formula  (5.10)  may  be  useful.  We  will  see  that 
in  the  scalar  RCA  case  it  is  possible  to  obtain  bounds  with  probability 
one  on  quantities  like  f~jt  ^  3^t|t  j/3B^»  *-n  case  the  bounds 

in  El  and  E3  are  not  very  restrictive. 

5.2  Asymptotic  normality. 

To  ease  comparison  with  the  results  of  Section  3  we  introduce  the 
matrix  U'  defined  by  U ' =hV ' ,  where  V'=(V!^)  is  given  by  (5.7). 
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Using  (5.11)  U'  is  given  for  8=8  by 

...  rpxLt-i  r.i  3xtit-i ,,  r8vec(ftit-i)iT,-i  ..-x  8vec<ftit-i>‘  „ 

“  'EpS6—  ft 1 1-1  5b  - 5TJ - J  - 5B-1 -  (S14> 

Corresponding  to  Theorem  3.2  we  have 

Theorem  5.2:  Assume  that  the  conditions  of  Theorem  5.1  are  fulfilled 
and  that  for  8=3°  and  i,j  =  l,...,s 


1  1  3  J  (  i  3  ' 


*2Tr[E{(Xt-Xt|tl)— |^ifj|t_1(Xt-*t|t_1)(Xt-*t|t_I)l|Fj_1)('jt_1— 

*2Tr[E{(xt-xtlt_1)^|^;|t_1(xt-xt|t_1)cxt-xtlt_1)T|Fj_1}f;]tl!i|iJrif;]t_1] 

*Tr[E((x/xt|,.i)(X,-'Xt|«.i)Tf;],.1!%^;}t.1(xt-it|t.1)CXt-it|t.1)T|fJ.:}- 


.f-l  tlt-1  rl  . 

ft  |t-l  38^  ft|t-lJ 


(5.15) 


Let  S=(S^j),  and  let  (8n)  be  the  estimators  obtained  in  Theorem  5.1.  Then 
rd<Pt  3<J>  ■ 

we  have  S  .  =  h  E  -  U.  .  and 

13  oPj  38j  ij 


Ae  -fP)  ^  N(0  ,(U')_i  +  (U')_1  S(U’)'1) 


(5.16) 


Proof:  We  use  the  same  technique  as  in  the  proof  of  Theorem  3.2.  From 
the  martingale  central  limit  theorem  in  the  strictly  stationary 
ergodic  situation  and  a  Cramer-Wold  argument,  it  follows  that 
n_5i3Ln(^)/98  has  a  multivariate  normal  distribution  as  its  limiting 
distribution  if  the  limiting  covariance  of  this  quantity  exists.  Using 

/v 

Theorem  2.2  this  implies  asymptotic  normality  of  8n  and  what  remains 
is  to  evaluate  the  covariance  matrix. 

Since  (3L  (fP)/38.,F*}  is  a  martingale,  it  is  easy  to 


verify  that 


Using  (3.3),  (5.8)  and  (5.10)  it  is  not  difficult  to  show  that  for 


8  =  3 


(  3<J>  x  |) 

T  w:  lFli }  ■  4<sij  *  ulj> 

.  »  ; 


(5.18 


The  finiteness  of  E{n~^3Ln(8°)/38j •n”'53Ln(8°)/98^}  now  follows  from 
the  assumptions  El  and  FI,  while  the  form  of  the  covariance  matrix 
in  (5.16)  follows  from  (2.4)  and  the  definition  of  S  and  U'.  || 

For  a  conditional  Gaussian  process  we  have  that  Xt  is  Gaussian 

Y 

conditional  on  ^  with  mean  Xt|t  ^  and  covariance  matrix  ^t|t  j 


and  for  8=8  we  have 


C5.19) 


for  arbitrary  components  i, j ,k=l , . . . ,d.  Moreover,  from  well-known 
properties  of  the  multivariate  normal  distribution  we  have 

E((Xti’Xt|t-l.i,(Xtj‘Xt|t.l,J)(Xtk'Xt|t-l.k,(Xw‘Xt|t-J>«)1 

ft|t-l,ik  ft|t-l,jm  +  ft|t-l,imft|t-l,  jk  +  ft|t-l,ijft|t-l,km  (5.20) 
for  i,j,k,m  =  l,...,d.  Using  this  in  conjunction  with  (5.15)  it 
is  not  difficult  to  show  that  S=0  in  this  case,  and  (5,16)  reduces  to 


1^(8  -8°)  $  N(0,(U')_1). 


(5.21 


In  the  case  where  ^t|t  j  does  not  depend  on  the  parameter  8 
of  interest,  we  also  have  S=0  and  U'^EOX^ t_j (6°)/38f’|t_13Xt 1 1_1(6° )/ 

Under  the  additional  assumption  of  Corollary  (3.1)  we  have 
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U*  ■  E[3Xt |^_j (6°)/38(E(ft | t_j) }  *  3Xt |t_j (8°)/38]  and  estimation  using 
Si  an<*  On  (3*1)  essentially  gives  identical  results. 

For  a  scalar  process  (Xt)  our  general  formulae  simplifies 
considerably.  We  get  for  8  *  8° 


3X 


t-1 


t  t-1 

5eT“ 


-5*r 


(5.22) 


and 


S.  .  =  h  E-^ 
13  '  1 


{71—  nij'1 3ft4;-1  w(v\i.-.)4i'ii» 

t | t-i  J 


where  i,j«l,...,s.  In  the  next  section  we  will  restrict  ourselves 
to  the  scalar  case,  since  this  is  the  only  case  considered  by  Nicholls 
and  Quinn  (1982)  for  RCA  processes. 

6.  An  example:  RCA  processes 

The  method  used  by  Nicholls  and  Quinn  (1982,  Ch.4)  requires 
compactness  of  the  region  over  which  the  parameter  vector  is  allowed 
to  vary.  This  necessitates  rather  restrictive  conditions  (cf. 
conditions  (ci)  -  (cii) ,  p.  64  of  their  monograph).  On  the  other 
hand  the  boundedness  conditions  on  the  moments  are  weaker  than  in  the 
conditional  least  squares  case. 

Using  our  general  theoretical  framework  we  are  able  to  dispense 
with  the  compactness  conditions,  while  retaining  the  same  weak 
conditions  on  the  moments.  As  in  Section  4  we  assume  that  conditions 
are  fulfilled  so  that  an  ergodic  strictly  and  second  order  stationary 
F  -measurable  solution  of  (4.12),  or  equivalently  (4.13),  exists. 
Moreover,  we  will  again  omit  the  superscript  0  for  the  true  value  of 


(5.23) 


the  parameter  vector.  Finally,  it  is  clear  that  (3.7)  is  satisfied 
with  m=p. 


In  the  scalar  RCA  case  we  have  from  (4.14)  that 

F(t-l.X)  =  [Xtl,...,Xtp]  A  yJj  (6.1) 

T 

and,  using  (4.15),  Xt|t  ^  such  that  3Xt|t  =  Xt  .  . 

Furthermore,  from  (4.17)  it  follows  that 

■  vLi 4  Yt.i  ♦  •  <6-2 

2  2 

where  o  =  E(et) .  Corresponding  to  Theorem  4.2  we  have 

Theorem  6.1:  Let  {X^}  be  a  scalar  RCA  process  such  that  the  above 

stated  conditions  are  satisfied.  Assume  that  (et>  cannot  take  on 

only  two  values  almost  surely  and  that  A  is  positive  definite.  Then 

there  exists  a  sequence  of  estimators  Uan>An,o^)}  minimizing  (as 

described  in  the  conclusion  of  Theorem  2.1)  the  penalty  function  L  of 

n 

(5.5)  and  such  that  (a^A^o2)  (a,A,o2) . 

Proof:  We  denote  by  A  .  >0  the  minimum  eigenvalue  of  A.  It  is 
-  min 


seen  from  (6.2)  that 


ft|t-l  -  ^in  Yt-lYt-l  +  0  -  '  yt  v 

Vin  t-1  t-1 


(6.3) 


whereas 


tft-1 


|2Xt_.Xt_.|  <  Yt 


(6.4) 


for  i,j=l,...p,  and  3ft|t  =  1.  It  follows  from  the  assumption 

2  -1 

on  {et}  that  a  >  0,  and  thus  ft|t  j  is  well  defined  and  we  have 
from  (6.3)  and  (6.4)  that 


In  (5.8)  and  (5.9)  only  first  order  derivatives  are  non-zero  for  the  RCA  case 


and  it  is  seen  by  examining  these  expressions  on  a  term  by  term 
basis  that  each  of  the  terms  involved  in  evaluating  E(  |  | ) 

and  E(|  3 2<p 3 3^3 g_.  |)  is  bounded  by  KB(X^)  for  some  constant  K.  For 
example  for  the  6th  and  7th  term  of  (5.9)  we  have  with  a  slight 


3ft|t-l  3ft I t-1 


abuse  of  notation 

^tlt-1  -2  ^tlt-1  ~  2-3  ^^tlt-1  ^tlt 

j  i  ’  1  j  i 


<  4maxf  J— ,  4]  E{  |a.X  (X  -aTY  )|} 

"  a  l  min  a  3  t'J  1 

4]}  e{|vJy.-i|2}- 

o  x  x  min  o  n  v  1  ' 


(6.7) 


The  other  terms  can  be  treated  similarly  and  it  follows  that  El 
of  Theorem  5.1  is  satisfied. 

We  can  use  (5.9)  to  find  third  order  derivatives  of  <J>t.  Again, 
remembering  that  derivatives  of  second  and  higher  order  for  X^|t  ^ 
and  f  |,  ,  are  zero,  we  obtain 


for  i, j,k«l, . . . ,s.  Since  a >  0  and  A  .  >  0,  there  exists  an  open 

mn  r 

set  that  contains  the  true  parameter  vector,  and  is  such  that  the 

closure  of  B  in  the  parameter  space  do  not  contain  a2*0  and  X  .  =0. 

min 

Hence,  using  exactly  the  same  arguments  as  above,  we  find  by  examining 

7 

the  terms  of  (6.8)  separately  that  |9  <Js  ($)/90. 90.90.  |  <  M|x.  |2  for  a 
constant  M  and  where  this  holds  for  all  0  e  B.  Thus,  since  we 
assume  that  {Xt>  is  second  order  stationary,  it  follows  that  condition 
E3  of  Theorem  5.1  is  fulfilled. 


It  remains  to  verify  E2.  Let  b^  i=0,l,...,p,  and  b^,  l<i<j 


for  j=l,...,p  and  assume  that 


|=Ibijxt.ixt-j  *  bo>2> 

with  f^|t_j  as  in  (6.2).  Since  f”|t_j  >  0,  this  implies 

P 

j  P  a.s. 

I  b.X.  .  =  '0  and  )  )  b. .X,  . X^  .  +  b  =  0 

i=l  1  t-:L  i=l  j=i  *1  t-i  0 


(6.9) 


(6.10 


and  due  to  the  linear  independence  properties  of  RCA  processes  (cf. 
proof  of  Theorem  4.2)  it  follows  that  the  b's  are  all  zero,  and  E2  is 
verified.  1 1 


It  is  perhaps  worth  noting  that  the  verification  of  El  and  E3 
given  in  the  above  proof  holds  for  any  class  of  processes  where  the 
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second  and  higher  order  derivatives  of  xt|t_j  and  f^|t  ^  are  zero, 

where  3xt|t  j/38  is  linear  in  both  8  and  and  where  bounds  as 

in  (6.5)  and  (6.6)  can  be  established. 

To  prove  asymptotic  normality,  according  to  Theorem  5.2,  we 
have  to  prove  finiteness  of  with  S^.  defined  as  in  (5.15). 
Corresponding  to  Theorem  4.3  we  have 

/v  /s  2 

Theorem  6.2:  The  estimates  (3n,An,a  n)  obtained  in  Theorem  6.1  are 

joint  asymptotically  normal,  if,  in  addition  to  the  conditions  of 

4  4 

Theorem  6.1,  we  assume  E(et)  <  “  and  E(bti)  <  00  ,  i=l,...,p. 


Proof:  We  only  look  at  the  term 

E  |  £E[cij] 

of  (5.15).  The  other  terms  can  be  treated  likewise. 

Using  the  fact  that  (et)  and  (bt(p)>  =  {[btl> . . . ,btp]}  are 
independent  with  E(et)  =  E(bt(p)}  =  0,  we  have 

'.Ill  I.  Xt-iXt-jXt-kXt-m  E‘btibtjbtkb«> 

i=li=lk=lm=l 


(6.11) 


* 6  °\I  .1,  xt-ixt-jE(btibtj)  * 

1=1 1 =1 


(6.12) 


From  (6.4)  and  E(b^)  <  °°,  i=l,...,p,  it  follows  by  successive  applications 


of  the  Schwarz  inequality  that 
P  P  P  P 


III,  I  ixt-ixt-jxt-kxt-»i  iE<btibtjbtkbt.>  i  iXXv2  <613> 

i=li=lk=lm=l 


P  P 


I  .1  lXt-iXt-jl  lE<btibtj>  '  ±  V  YtYt 
1=1  j=l 


(6.14) 
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4 

for  some  positive  constants  and  M2>  Using  Efe^)  <  °°  and  (6.3). 

(6.5)  and  (6.6)  it  is  seen  that  defined  in  (6.11)  is  bounded  with 
probability  one,  and  thus  E(C„)  <  °°.  The  other  terms  of  (5.15)  are 
shown  to  have  a  finite  expectation  using  identical  arguments,  and 
this  completes  the  proof. | | 

The  asymptotic  covariance  matrix  can  now  be  computed  from 

formula  (5.16)  and  the  results  of  Nicholls  and  Quinn  (1982,  Appendix 

4.2)  can  easily  be  derived.  In  particular,  in  the  case  where  {bt(p),et> 

X 

is  Gaussian  then  given  ^  ,  the  random  variable 

T 

bt(p)Yt  i  +  e^  is  normal  with  mean  zero  and  variance  Yt_j  A  Yt  j  +  a2. 

It  follows  that  S=0  in  (5.16),  and  the  asymptotic  covariance  matrix 
is  then  given  by  n  *(U')  *  with  U'  as  in  (5.14).  This  agrees  with 
the  results  of  Nicholls  and  Quinn  (1982,  p.  80). 

For  the  more  general  processes  arising  in  Kalman  type  filtering 
models  (Ledolter  1981)  the  estimation  problem  is  considerably  more 
difficult.  These  processes  are  in  general  nonstationary,  and  we  refer 
to  Tjjistheim  (1984b)  for  a  special  case. 
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